Canonical correlation analysis is a widely used multivariate statisticaltechnique for exploring the relation between two sets of variables. This paperconsiders the problem of estimating the leading canonical correlationdirections in high-dimensional settings. Recently, under the assumption thatthe leading canonical correlation directions are sparse, various procedureshave been proposed for many high-dimensional applications involving massivedata sets. However, there has been few theoretical justification available inthe literature. In this paper, we establish rate-optimal nonasymptotic minimaxestimation with respect to an appropriate loss function for a wide range ofmodel spaces. Two interesting phenomena are observed. First, the minimax ratesare not affected by the presence of nuisance parameters, namely the covariancematrices of the two sets of random variables, though they need to be estimatedin the canonical correlation analysis problem. Second, we allow the presence ofthe residual canonical correlation directions. However, they do not influencethe minimax rates under a mild condition on eigengap. A generalized sin-thetatheorem and an empirical process bound for Gaussian quadratic forms under rankconstraint are used to establish the minimax upper bounds, which may be ofindependent interest.
展开▼